11 research outputs found

    Razvoj akustičkog modela hrvatskog jezika pomoću alata HTK

    Get PDF
    Paper presents development of the acoustic model for Croatian language for automatic speech recognition (ASR). Continuous speech recognition is performed by means of the Hidden Markov Models (HMM) implemented in the HMM Toolkit (HTK). In order to adjust the HTK to the native language a novel algorithm for Croatian language transcription (CLT) has been developed. It is based on phonetic assimilation rules that are applied within uttered words. Phonetic questions for state tying of different triphone models have also been developed. The automated system for training and evaluation of acoustic models has been developed and integrated with the new graphical user interface (GUI). Targeted applications of this ASR system are stress inoculation training (SIT) and virtual reality exposure therapy (VRET). Adaptability of the model to a closed set of speakers is important for such applications and this paper investigates the applicability of the HTK tool for typical scenarios. Robustness of the tool to a new language was tested in matched conditions by a parallel training of an English model that was used as a baseline. Ten native Croatian speakers participated in experiments. Encouraging results were achieved and reported with the developed model for Croatian language.Rad opisuje razvoj akustičkog modela hrvatskog jezika za potrebe sustava za automatsko prepoznavanje govora. Prepoznavanje prirodnog spojenog izgovora ostvaruje se koriÅ”tenjem skrivenih Markovljevih modela (HMM) u okviru alata HTK. U svrhu prilagodbe ovog alata na hrvatski jezik razvijen je novi algoritam za automatsku fonetsku transkripciju hrvatskih riječi. Zasniva se na načelu fonetske asimilacije unutar izgovorenih riječi. Razvijen je i skup fonetskih pitanja koji se koristi za klasifikaciju prilikom udruživanja trifonskih modela sličnih glasova. Razvijena je automatizirana aplikacija za gradnju i evaluaciju akustičkih modela, integrirana s novo razvijenim grafičkim sučeljem. Primjene ovog sustava za prepoznavanje su trening s doziranim izlaganjem stresu (SIT) i terapija izlaganjem primjenom virtualne stvarnosti (VRET). Prilagodljivost akustičkog modela na zatvoren skup govornika vrlo je važna za takve primjene, pa se u radu istražuje primjenjivost alata HTK u tipičnim scenarijima. Robusnost alata na promjenu jezika istražuje se uparenim treniranjem i evaluacijom ekvivalentnog modela engleskog jezika u jednakim uvjetima. U eksperimentima je sudjelovalo deset izvornih hrvatskih govornika. Ostvareni rezultati za hrvatski jezik prikazani u radu pokazuju zadovoljavajuća svojstva razvijenog akustičkog modela hrvatskog jezika

    Croatian Emotional Speech Analyses on a Basis of Acoustic and Linguistic Features

    Get PDF
    Acoustic and linguistic speech features are used for emotional state estimation of utterances collected within the Croatian emotional speech corpus. Analyses are performed for the classification of 5 discrete emotions, i.e. happiness, sadness, fear, anger and neutral state, as well as for the estimation of two emotional dimensions: valence and arousal. Acoustic and linguistic cues of emotional speech are analyzed separately, and are also combined in two types of fusion: a feature level fusion and a decision level fusion. The Random Forest method is used for all analyses, with the combination of Info Gain feature selection method for classification tasks and Univariate Linear Regression method for regression tasks. The main hypothesis is confirmed, i.e. an increase of classification accuracy is achieved in the cases of fusion analyses (compared with separate acoustic or linguistic feature sets usages), as well as a decrease of root mean squared error when estimating emotional dimensions. Most of other hypothesis are also confirmed, which suggest that acoustic and linguistic cues of Croatian language are showing similar behavior as other languages in the context of emotional impact on speech

    An Application of Fuzzy Inductive Logic Programming for Textual Entailment and Value Mining

    Get PDF
    The aim of this preliminary report is to give an overview of textual entailment in natural language processing (NLP), to present our approach to research and to explain the possible applications for such a system. Our system presupposes several modules, namely the sentiment analysis module, the anaphora resolution module, the named entity recognition module and the relationship extraction module. State-of-the-art modules will be used but no amount of research will go into this. The research focuses on the main module that extracts background knowledge from the extracted relationships via resolution and inverse resolution (inductive logic programming). The last part focuses on possible economic applications of our research

    Metodologija estimacije emocionalnih stanja na temelju akustičkih značajki govora

    Get PDF
    U novije vrijeme se sve veća pažnja posvećuje problematici računalne estimacije emocionalnog stanja iz čovjekovog glasa, prvenstveno u kontekstu razvoja sustava za inteligentnu interakciju između čovjeka i računala. U radu je opisana metodologija estimacije po koracima: izvlačenje akustičkih značajki emocionalnog govora, redukcija prostora značajki te estimacija emocionalnih stanja na temelju neke od metoda strojnog učenja. Emocije se tipično reprezentiraju kao diskretna stanja, poput sreće, ljutnje, straha ili gađenja, ili kao dimenzije, najčeŔće kao razine ugode i pobuđenosti. Pritom se za raspoznavanje diskretnih emocija koriste klasifikacijske metode, a za estimaciju dimenzijskih veličina emocija regresijske. U radu je dan pregled state-of-the-art akustičkih značajki za prepoznavanje emocija te su prikazani rezultati relevantnih radova na ovom području

    Metodologija estimacije emocionalnih stanja na temelju akustičkih značajki govora

    Get PDF
    U novije vrijeme se sve veća pažnja posvećuje problematici računalne estimacije emocionalnog stanja iz čovjekovog glasa, prvenstveno u kontekstu razvoja sustava za inteligentnu interakciju između čovjeka i računala. U radu je opisana metodologija estimacije po koracima: izvlačenje akustičkih značajki emocionalnog govora, redukcija prostora značajki te estimacija emocionalnih stanja na temelju neke od metoda strojnog učenja. Emocije se tipično reprezentiraju kao diskretna stanja, poput sreće, ljutnje, straha ili gađenja, ili kao dimenzije, najčeŔće kao razine ugode i pobuđenosti. Pritom se za raspoznavanje diskretnih emocija koriste klasifikacijske metode, a za estimaciju dimenzijskih veličina emocija regresijske. U radu je dan pregled state-of-the-art akustičkih značajki za prepoznavanje emocija te su prikazani rezultati relevantnih radova na ovom području

    COMPUTER-AIDED PSYCHOTHERAPY BASED ON MULTIMODAL ELICITATION, ESTIMATION AND REGULATION OF EMOTION

    Get PDF
    Contemporary psychiatry is looking at affective sciences to understand human behavior, cognition and the mind in health and disease. Since it has been recognized that emotions have a pivotal role for the human mind, an ever increasing number of laboratories and research centers are interested in affective sciences, affective neuroscience, affective psychology and affective psychopathology. Therefore, this paper presents multidisciplinary research results of Laboratory for Interactive Simulation System at Faculty of Electrical Engineering and Computing, University of Zagreb in the stress resilience. Patientā€™s distortion in emotional processing of multimodal input stimuli is predominantly consequence of his/her cognitive deficit which is result of their individual mental health disorders. These emotional distortions in patientā€™s multimodal physiological, facial, acoustic, and linguistic features related to presented stimulation can be used as indicator of patientā€™s mental illness. Real-time processing and analysis of patientā€™s multimodal response related to annotated input stimuli is based on appropriate machine learning methods from computer science. Comprehensive longitudinal multimodal analysis of patientā€™s emotion, mood, feelings, attention, motivation, decision-making, and working memory in synchronization with multimodal stimuli provides extremely valuable big database for data mining, machine learning and machine reasoning. Presented multimedia stimuli sequence includes personalized images, movies and sounds, as well as semantically congruent narratives. Simultaneously, with stimuli presentation patient provides subjective emotional ratings of presented stimuli in terms of subjective units of discomfort/distress, discrete emotions, or valence and arousal. These subjective emotional ratings of input stimuli and corresponding physiological, speech, and facial output features provides enough information for evaluation of patientā€™s cognitive appraisal deficit. Aggregated real-time visualization of this information provides valuable assistance in patient mental state diagnostics enabling therapist deeper and broader insights into dynamics and progress of the psychotherapy

    Emotional state estimation based on data mining of acoustic speech features

    No full text
    Estimacija emocionalnih stanja iz govora može imati važnu ulogu u mnogim područjima. U okviru ove doktorske disertacije realiziran je sustav za estimaciju emocionalnih stanja, temeljen na akustičkim značajkama govornog signala, koji svoju primjenu može naći u psihoterapiji te u postupcima selekcije i obuke kandidata za stresne i odgovorne operacije. Zbog takvog potencijala je poseban naglasak stavljen na estimaciju govora pod stresom, kao i na pobuđivanje ispitanika prepadnim, odnosno startle pobudama. Istražena je neurobioloÅ”ka podloga nastanka emocija kao i utjecaj emocija na bioloÅ”ke mehanizme za produkciju govora, a posljedično i na pojedine akustičke parametre i značajke iz glasa. Predložene su mjere perturbacije glasa, odnosno značajke utjecaja limbičkih struktura na poremećaje koordinacije antagonističkog procesa titranja glasnica, koje su rezultirale značajnom razlučivosti na razinu stresa u glasu. Pritom je ustavovljena i njihova robusnost na voljne komponente govora, konkretno dinamike fundamentalne frekvencije tijekom izgovora, gdje se konvencionalne perturbacijske mjere (jitter) nisu pokazale toliko uspjeÅ”ne. Analiziran je utjecaj intenzivnih zvučnih pobuda impulsnog oblika, odnosno startle pobuda, na promjene fundamentalne frekvencije glasa. Takozvane fear-potentiated startle reakcije nalaze veliku primjenu u dijagnostici posttraumatskog stresnog poremećaja, odnosno u paradigmama kondicioniranja i ekstinkcije straha. Kao konvencionalna mjera za predikciju startle reakcija danas se koristi elektromiografija orbicularis oculi miÅ”ića, to jest analiza treptaja oka. U okviru ove disertacije izvrÅ”ena je usporedna analiza odziva na fundamentalnoj frekvenciji i odziva na orbicularis oculi miÅ”iću te su ustanovljene konzistentnosti i slična svojstva odziva. Nadalje, predloženo je unaprjeđenje konvencionalne arhitekture sustava za estimaciju dimenzijskih emocija, ugode i pobuđenosti, s a priori znanjem o povezanosti tih emocija. Analizama je potvrđeno unaprjeđenje točnosti estimacije koriÅ”tenjem takve arhitekture.This doctoral thesis is the result of research on the project ā€œAdaptive Control of Scenarios in VR Therapy of PTSDā€, which aims to develop collaborative and intelligent agent that, as a decision-making support, could be applicable in a number of areas such as prediction, selection, diagnosis and the treatment of mental disorders, especially those caused by stress. The thesis explores the estimation problem of emotional states, stress and acoustic startle responses based on acoustic speech features. Emphasis is placed on evaluating the features using statistical analysis methods in the context of the aforementioned problems. New voice perturbation features are proposed and evaluated in this thesis that describe the impact of limbic structures on neural regions responsible for coordinating the antagonistic process of the vocal folds vibrations. A comparative analysis of changes in speech fundamental frequency (F0) with electromyographic (EMG) response of the orbicularis oculi muscle was performed. This thesis proposes improvement of the conventional system architecture for estimating emotional dimensions, valence and arousal, with a priori knowledge about the relation between these two emotional dimensions. The introductory chapter defines the domains, motivation and objectives of the research, citing the inherent interdisciplinarity of the research field. The scientific contributions and the structure of the dissertation are also defined in this chapter. In the second chapter, neurobiological processes are described through which emotions impact on speech production mechanisms. The influence of emotions on respiration, phonation and articulation mechanisms of speech is explored. Special attention is given to the internal muscles of the larynx, i.e. phonation mechanisms, which due to their sensitive structures are most vulnerable to the impact of emotions. The acoustic speech features that are commonly used for estimation of emotional states and stress are described in the third chapter. Furthermore, decomposition of speech fundamental frequency is proposed, where components selectively include specific neurobiological processes of emotions. Speech perturbation features are proposed that describe the time and amplitude aspect of the disturbance in the vocal folds oscillation, which is a consequence of the limbic system influence on the cerebellum and brainstem. The proposed features are validated using the example of artificially generated speech perturbations and in terms of speech under stress. In most cases, the proposed features showed statistically significant difference to the level of speech perturbations and the level of stress. Furthermore, their satisfactory robustness was shown to the impact of the voluntary component in pronunciation, in particular the dynamics of the fundamental frequency, which is their main benefit over conventional speech perturbation measures (e.g. jitter measures). In the fourth chapter, F0 features are validated in the context of the acoustic startle response. Features like peak value, peak time, duration etc. are validated depending on the parameter changes of the startle stimulus, i.e. intensity, duration, rise time and spectral characteristics of the stimulus, as well as depending on the existence and intensity of the startle response. A comparative analysis is performed between F0 response features and EMG features of the orbicularis oculi muscle response (eyeblink), which is considered the reference measure for startle reaction analysis. Analyses have shown similar behavior of F0 and EMG responses when changing the intensity of the startle stimulus. In both cases the highest statistically significant difference is achieved for the response peak value. A significant increasing trend was observed in peak values of F0 and EMG responses with an increase in the stimulus intensity at higher levels of stimulus intensity. In the fifth chapter, the methodology of emotional state estimation based on acoustic speech features is described, which is conventionally performed through four sequential processes: speech signal processing with the extraction of acoustic measures; feature calculation from acoustic measures; reduction of a feature space; and estimation of emotional states using machine learning methods. An upgrade of conventional architecture for estimating emotion dimensions, valence and arousal, which is based on a priori relationships between the two dimensions, is proposed in this thesis. A priori model is applied on the conventional estimation process in order to shift estimation results in valence-arousal space toward more probable values, according to the level of the estimation uncertainty. Different approaches to a priori knowledge modeling have been undertaken: (a) single integral model over valence-arousal space, and (b) integration of multiple models that represent different discrete emotions in the valence-arousal space, specifically happiness, sadness, fear, anger and neutral state. Building and validation of the emotional state estimation system are performed using utterances from the Croatian emotional speech corpus, which was collected and annotated in collaboration with the University of Zagreb, Faculty of Humanities and Social Sciences. In the sixth chapter, validation of machine learning methods, specifically support vector machines and random forest, is performed in the cases of emotional states, stress and startle response estimation. In this context, the improvements proposed in the thesis were compared with conventional approaches from the literature. The results showed the justification for introducing new perturbation speech features for classification of speech under stress, applying F0 features for startle response analysis and proposing the enhanced method for estimation of emotional states. The last chapter concludes the doctoral thesis and provides suggestions for future related research. Specific applications of the proposed methods are also discussed

    Emotional state estimation based on data mining of acoustic speech features

    No full text
    Estimacija emocionalnih stanja iz govora može imati važnu ulogu u mnogim područjima. U okviru ove doktorske disertacije realiziran je sustav za estimaciju emocionalnih stanja, temeljen na akustičkim značajkama govornog signala, koji svoju primjenu može naći u psihoterapiji te u postupcima selekcije i obuke kandidata za stresne i odgovorne operacije. Zbog takvog potencijala je poseban naglasak stavljen na estimaciju govora pod stresom, kao i na pobuđivanje ispitanika prepadnim, odnosno startle pobudama. Istražena je neurobioloÅ”ka podloga nastanka emocija kao i utjecaj emocija na bioloÅ”ke mehanizme za produkciju govora, a posljedično i na pojedine akustičke parametre i značajke iz glasa. Predložene su mjere perturbacije glasa, odnosno značajke utjecaja limbičkih struktura na poremećaje koordinacije antagonističkog procesa titranja glasnica, koje su rezultirale značajnom razlučivosti na razinu stresa u glasu. Pritom je ustavovljena i njihova robusnost na voljne komponente govora, konkretno dinamike fundamentalne frekvencije tijekom izgovora, gdje se konvencionalne perturbacijske mjere (jitter) nisu pokazale toliko uspjeÅ”ne. Analiziran je utjecaj intenzivnih zvučnih pobuda impulsnog oblika, odnosno startle pobuda, na promjene fundamentalne frekvencije glasa. Takozvane fear-potentiated startle reakcije nalaze veliku primjenu u dijagnostici posttraumatskog stresnog poremećaja, odnosno u paradigmama kondicioniranja i ekstinkcije straha. Kao konvencionalna mjera za predikciju startle reakcija danas se koristi elektromiografija orbicularis oculi miÅ”ića, to jest analiza treptaja oka. U okviru ove disertacije izvrÅ”ena je usporedna analiza odziva na fundamentalnoj frekvenciji i odziva na orbicularis oculi miÅ”iću te su ustanovljene konzistentnosti i slična svojstva odziva. Nadalje, predloženo je unaprjeđenje konvencionalne arhitekture sustava za estimaciju dimenzijskih emocija, ugode i pobuđenosti, s a priori znanjem o povezanosti tih emocija. Analizama je potvrđeno unaprjeđenje točnosti estimacije koriÅ”tenjem takve arhitekture.This doctoral thesis is the result of research on the project ā€œAdaptive Control of Scenarios in VR Therapy of PTSDā€, which aims to develop collaborative and intelligent agent that, as a decision-making support, could be applicable in a number of areas such as prediction, selection, diagnosis and the treatment of mental disorders, especially those caused by stress. The thesis explores the estimation problem of emotional states, stress and acoustic startle responses based on acoustic speech features. Emphasis is placed on evaluating the features using statistical analysis methods in the context of the aforementioned problems. New voice perturbation features are proposed and evaluated in this thesis that describe the impact of limbic structures on neural regions responsible for coordinating the antagonistic process of the vocal folds vibrations. A comparative analysis of changes in speech fundamental frequency (F0) with electromyographic (EMG) response of the orbicularis oculi muscle was performed. This thesis proposes improvement of the conventional system architecture for estimating emotional dimensions, valence and arousal, with a priori knowledge about the relation between these two emotional dimensions. The introductory chapter defines the domains, motivation and objectives of the research, citing the inherent interdisciplinarity of the research field. The scientific contributions and the structure of the dissertation are also defined in this chapter. In the second chapter, neurobiological processes are described through which emotions impact on speech production mechanisms. The influence of emotions on respiration, phonation and articulation mechanisms of speech is explored. Special attention is given to the internal muscles of the larynx, i.e. phonation mechanisms, which due to their sensitive structures are most vulnerable to the impact of emotions. The acoustic speech features that are commonly used for estimation of emotional states and stress are described in the third chapter. Furthermore, decomposition of speech fundamental frequency is proposed, where components selectively include specific neurobiological processes of emotions. Speech perturbation features are proposed that describe the time and amplitude aspect of the disturbance in the vocal folds oscillation, which is a consequence of the limbic system influence on the cerebellum and brainstem. The proposed features are validated using the example of artificially generated speech perturbations and in terms of speech under stress. In most cases, the proposed features showed statistically significant difference to the level of speech perturbations and the level of stress. Furthermore, their satisfactory robustness was shown to the impact of the voluntary component in pronunciation, in particular the dynamics of the fundamental frequency, which is their main benefit over conventional speech perturbation measures (e.g. jitter measures). In the fourth chapter, F0 features are validated in the context of the acoustic startle response. Features like peak value, peak time, duration etc. are validated depending on the parameter changes of the startle stimulus, i.e. intensity, duration, rise time and spectral characteristics of the stimulus, as well as depending on the existence and intensity of the startle response. A comparative analysis is performed between F0 response features and EMG features of the orbicularis oculi muscle response (eyeblink), which is considered the reference measure for startle reaction analysis. Analyses have shown similar behavior of F0 and EMG responses when changing the intensity of the startle stimulus. In both cases the highest statistically significant difference is achieved for the response peak value. A significant increasing trend was observed in peak values of F0 and EMG responses with an increase in the stimulus intensity at higher levels of stimulus intensity. In the fifth chapter, the methodology of emotional state estimation based on acoustic speech features is described, which is conventionally performed through four sequential processes: speech signal processing with the extraction of acoustic measures; feature calculation from acoustic measures; reduction of a feature space; and estimation of emotional states using machine learning methods. An upgrade of conventional architecture for estimating emotion dimensions, valence and arousal, which is based on a priori relationships between the two dimensions, is proposed in this thesis. A priori model is applied on the conventional estimation process in order to shift estimation results in valence-arousal space toward more probable values, according to the level of the estimation uncertainty. Different approaches to a priori knowledge modeling have been undertaken: (a) single integral model over valence-arousal space, and (b) integration of multiple models that represent different discrete emotions in the valence-arousal space, specifically happiness, sadness, fear, anger and neutral state. Building and validation of the emotional state estimation system are performed using utterances from the Croatian emotional speech corpus, which was collected and annotated in collaboration with the University of Zagreb, Faculty of Humanities and Social Sciences. In the sixth chapter, validation of machine learning methods, specifically support vector machines and random forest, is performed in the cases of emotional states, stress and startle response estimation. In this context, the improvements proposed in the thesis were compared with conventional approaches from the literature. The results showed the justification for introducing new perturbation speech features for classification of speech under stress, applying F0 features for startle response analysis and proposing the enhanced method for estimation of emotional states. The last chapter concludes the doctoral thesis and provides suggestions for future related research. Specific applications of the proposed methods are also discussed

    Emotional state estimation based on data mining of acoustic speech features

    No full text
    Estimacija emocionalnih stanja iz govora može imati važnu ulogu u mnogim područjima. U okviru ove doktorske disertacije realiziran je sustav za estimaciju emocionalnih stanja, temeljen na akustičkim značajkama govornog signala, koji svoju primjenu može naći u psihoterapiji te u postupcima selekcije i obuke kandidata za stresne i odgovorne operacije. Zbog takvog potencijala je poseban naglasak stavljen na estimaciju govora pod stresom, kao i na pobuđivanje ispitanika prepadnim, odnosno startle pobudama. Istražena je neurobioloÅ”ka podloga nastanka emocija kao i utjecaj emocija na bioloÅ”ke mehanizme za produkciju govora, a posljedično i na pojedine akustičke parametre i značajke iz glasa. Predložene su mjere perturbacije glasa, odnosno značajke utjecaja limbičkih struktura na poremećaje koordinacije antagonističkog procesa titranja glasnica, koje su rezultirale značajnom razlučivosti na razinu stresa u glasu. Pritom je ustavovljena i njihova robusnost na voljne komponente govora, konkretno dinamike fundamentalne frekvencije tijekom izgovora, gdje se konvencionalne perturbacijske mjere (jitter) nisu pokazale toliko uspjeÅ”ne. Analiziran je utjecaj intenzivnih zvučnih pobuda impulsnog oblika, odnosno startle pobuda, na promjene fundamentalne frekvencije glasa. Takozvane fear-potentiated startle reakcije nalaze veliku primjenu u dijagnostici posttraumatskog stresnog poremećaja, odnosno u paradigmama kondicioniranja i ekstinkcije straha. Kao konvencionalna mjera za predikciju startle reakcija danas se koristi elektromiografija orbicularis oculi miÅ”ića, to jest analiza treptaja oka. U okviru ove disertacije izvrÅ”ena je usporedna analiza odziva na fundamentalnoj frekvenciji i odziva na orbicularis oculi miÅ”iću te su ustanovljene konzistentnosti i slična svojstva odziva. Nadalje, predloženo je unaprjeđenje konvencionalne arhitekture sustava za estimaciju dimenzijskih emocija, ugode i pobuđenosti, s a priori znanjem o povezanosti tih emocija. Analizama je potvrđeno unaprjeđenje točnosti estimacije koriÅ”tenjem takve arhitekture.This doctoral thesis is the result of research on the project ā€œAdaptive Control of Scenarios in VR Therapy of PTSDā€, which aims to develop collaborative and intelligent agent that, as a decision-making support, could be applicable in a number of areas such as prediction, selection, diagnosis and the treatment of mental disorders, especially those caused by stress. The thesis explores the estimation problem of emotional states, stress and acoustic startle responses based on acoustic speech features. Emphasis is placed on evaluating the features using statistical analysis methods in the context of the aforementioned problems. New voice perturbation features are proposed and evaluated in this thesis that describe the impact of limbic structures on neural regions responsible for coordinating the antagonistic process of the vocal folds vibrations. A comparative analysis of changes in speech fundamental frequency (F0) with electromyographic (EMG) response of the orbicularis oculi muscle was performed. This thesis proposes improvement of the conventional system architecture for estimating emotional dimensions, valence and arousal, with a priori knowledge about the relation between these two emotional dimensions. The introductory chapter defines the domains, motivation and objectives of the research, citing the inherent interdisciplinarity of the research field. The scientific contributions and the structure of the dissertation are also defined in this chapter. In the second chapter, neurobiological processes are described through which emotions impact on speech production mechanisms. The influence of emotions on respiration, phonation and articulation mechanisms of speech is explored. Special attention is given to the internal muscles of the larynx, i.e. phonation mechanisms, which due to their sensitive structures are most vulnerable to the impact of emotions. The acoustic speech features that are commonly used for estimation of emotional states and stress are described in the third chapter. Furthermore, decomposition of speech fundamental frequency is proposed, where components selectively include specific neurobiological processes of emotions. Speech perturbation features are proposed that describe the time and amplitude aspect of the disturbance in the vocal folds oscillation, which is a consequence of the limbic system influence on the cerebellum and brainstem. The proposed features are validated using the example of artificially generated speech perturbations and in terms of speech under stress. In most cases, the proposed features showed statistically significant difference to the level of speech perturbations and the level of stress. Furthermore, their satisfactory robustness was shown to the impact of the voluntary component in pronunciation, in particular the dynamics of the fundamental frequency, which is their main benefit over conventional speech perturbation measures (e.g. jitter measures). In the fourth chapter, F0 features are validated in the context of the acoustic startle response. Features like peak value, peak time, duration etc. are validated depending on the parameter changes of the startle stimulus, i.e. intensity, duration, rise time and spectral characteristics of the stimulus, as well as depending on the existence and intensity of the startle response. A comparative analysis is performed between F0 response features and EMG features of the orbicularis oculi muscle response (eyeblink), which is considered the reference measure for startle reaction analysis. Analyses have shown similar behavior of F0 and EMG responses when changing the intensity of the startle stimulus. In both cases the highest statistically significant difference is achieved for the response peak value. A significant increasing trend was observed in peak values of F0 and EMG responses with an increase in the stimulus intensity at higher levels of stimulus intensity. In the fifth chapter, the methodology of emotional state estimation based on acoustic speech features is described, which is conventionally performed through four sequential processes: speech signal processing with the extraction of acoustic measures; feature calculation from acoustic measures; reduction of a feature space; and estimation of emotional states using machine learning methods. An upgrade of conventional architecture for estimating emotion dimensions, valence and arousal, which is based on a priori relationships between the two dimensions, is proposed in this thesis. A priori model is applied on the conventional estimation process in order to shift estimation results in valence-arousal space toward more probable values, according to the level of the estimation uncertainty. Different approaches to a priori knowledge modeling have been undertaken: (a) single integral model over valence-arousal space, and (b) integration of multiple models that represent different discrete emotions in the valence-arousal space, specifically happiness, sadness, fear, anger and neutral state. Building and validation of the emotional state estimation system are performed using utterances from the Croatian emotional speech corpus, which was collected and annotated in collaboration with the University of Zagreb, Faculty of Humanities and Social Sciences. In the sixth chapter, validation of machine learning methods, specifically support vector machines and random forest, is performed in the cases of emotional states, stress and startle response estimation. In this context, the improvements proposed in the thesis were compared with conventional approaches from the literature. The results showed the justification for introducing new perturbation speech features for classification of speech under stress, applying F0 features for startle response analysis and proposing the enhanced method for estimation of emotional states. The last chapter concludes the doctoral thesis and provides suggestions for future related research. Specific applications of the proposed methods are also discussed

    COMPUTER-AIDED PSYCHOTHERAPY BASED ON MULTIMODAL ELICITATION, ESTIMATION AND REGULATION OF EMOTION

    Get PDF
    Contemporary psychiatry is looking at affective sciences to understand human behavior, cognition and the mind in health and disease. Since it has been recognized that emotions have a pivotal role for the human mind, an ever increasing number of laboratories and research centers are interested in affective sciences, affective neuroscience, affective psychology and affective psychopathology. Therefore, this paper presents multidisciplinary research results of Laboratory for Interactive Simulation System at Faculty of Electrical Engineering and Computing, University of Zagreb in the stress resilience. Patientā€™s distortion in emotional processing of multimodal input stimuli is predominantly consequence of his/her cognitive deficit which is result of their individual mental health disorders. These emotional distortions in patientā€™s multimodal physiological, facial, acoustic, and linguistic features related to presented stimulation can be used as indicator of patientā€™s mental illness. Real-time processing and analysis of patientā€™s multimodal response related to annotated input stimuli is based on appropriate machine learning methods from computer science. Comprehensive longitudinal multimodal analysis of patientā€™s emotion, mood, feelings, attention, motivation, decision-making, and working memory in synchronization with multimodal stimuli provides extremely valuable big database for data mining, machine learning and machine reasoning. Presented multimedia stimuli sequence includes personalized images, movies and sounds, as well as semantically congruent narratives. Simultaneously, with stimuli presentation patient provides subjective emotional ratings of presented stimuli in terms of subjective units of discomfort/distress, discrete emotions, or valence and arousal. These subjective emotional ratings of input stimuli and corresponding physiological, speech, and facial output features provides enough information for evaluation of patientā€™s cognitive appraisal deficit. Aggregated real-time visualization of this information provides valuable assistance in patient mental state diagnostics enabling therapist deeper and broader insights into dynamics and progress of the psychotherapy
    corecore